| 技術 | 好處 | 硬體 | 
|---|---|---|
| 動態範圍量化 | 小 4 倍,加速 2x-3x | CPU | 
| 全INT量化 | 小 4 倍,加速 3x+ | CPU、Edge TPU、微控制器 | 
| Float16 量化 | 小 2 倍,GPU 加速 | CPU、GPU | 
當您使用TensorFlow Lite Converter將已訓練的 TensorFlow 模型轉換為 TensorFlow Lite 格式時,您可以對其進行量化 。
另外 Pytorch 也有 QUANTIZATION 實作。
以下實作範例可用 Colab 執行,另請注意 TensorFlow 版本需 >= 1.15 。
tf.keras.datasets.mnist,用CNN進行建模。baseline_weights.h5,儲存量化前的模型non_quantized.h5,並記錄模型大小與準確率,以進行訓練後量化的比較。
TensorFlow Lite 使用 *.tflite格式,用tf.lite.TFLiteConverter.from_keras_model轉換先前建立的baseline_model。
converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
with open('non_quantized.tflite', 'wb') as f:
    f.write(tflite_model)
建立TF Lite 的評估模型準確率的函數,轉檔為tflite後需要特別撰寫評估函數,參考並改寫官方範例。
# A helper function to evaluate the TF Lite model using "test" dataset.
# from: https://www.tensorflow.org/lite/performance/post_training_integer_quant_16x8#evaluate_the_models
def evaluate_model(filemane):
  #Load the model into the interpreters
  interpreter = tf.lite.Interpreter(model_path=str(filemane))
  interpreter.allocate_tensors()
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]
  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for test_image in test_images:
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)
    # Run inference.
    interpreter.invoke()
    # Post-processing: remove batch dimension and find the digit 
    # with highest probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)
  # Compare prediction results with ground truth labels to calculate accuracy.
  accurate_count = 0
  for index in range(len(prediction_digits)):
    if prediction_digits[index] == test_labels[index]:
      accurate_count += 1
  accuracy = accurate_count * 1.0 / len(prediction_digits)
  return accuracy
此時評估模型的準確率相近,模型尺寸減少。
ACCURACY:
{'baseline Keras model': 0.9581000208854675, 
 'non quantized tflite': 0.9581}
MODEL_SIZE:
{'baseline h5': 98136, 
 'non quantized tflite': 84688}
# Dynamic range quantization
converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT] #增加此設定
tflite_model = converter.convert()
with open('post_training_quantized.tflite', 'wb') as f:
    f.write(tflite_model)
ACCURACY:
{'baseline Keras model': 0.9581000208854675,
 'non quantized tflite': 0.9581,
 'post training quantized tflite': 0.9582}
MODEL_SIZE:
{'baseline h5': 98136,
 'non quantized tflite': 84688,
 'post training quantized tflite': 24096} 
tensorflow_model_optimization 模組,該模組提供 quantize_model() 完成任務。epochs = 1。
ACCURACY:
{'baseline Keras model': 0.9581000208854675,
 'non quantized tflite': 0.9581,
 'post training quantized tflite': 0.9582,
 'quantization aware non-quantized': 0.1005999967455864}
MODEL_SIZE:
{'baseline h5': 98136,
 'non quantized tflite': 84688,
 'post training quantized tflite': 24096,
 'quantization aware non-quantized': 115680}